Improved Strong Worst-case Upper Bounds for MDP Planning
نویسندگان
چکیده
The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the discount factor and the number of bits needed to represent the MDP. Worst-case bounds apply to every run of an algorithm; randomised algorithms can typically yield faster expected running times. While the special case of 2-action MDPs (that is, k = 2) has recently received some attention, bounds for general k have remained to be improved for several decades. Our contributions are to this general case. For k ≥ 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. This bound is only a polynomial improvement over a trivial bound of poly(n, k) · k [Mansour and Singh, 1999]. In this paper, we generalise a contrasting algorithm called the Fibonacci Seesaw, and derive a bound of poly(n, k) · k. The key construct that we use is a template to map algorithms for the 2action setting to the general setting. Interestingly, this idea can also be used to design Policy Iteration algorithms with a running time upper bound of poly(n, k)·k. Both our results improve upon bounds that have stood for several decades.
منابع مشابه
Computing and Using Lower and Upper Bounds for Action Elimination in MDP Planning
We describe a way to improve the performance of MDP planners by modifying them to use lower and upper bounds to eliminate non-optimal actions during their search. First, we discuss a particular state-abstraction formulation of MDP planning problems and how to use that formulation to compute bounds on the Q-functions of those planning problems. Then, we describe how to incorporate those bounds i...
متن کاملPosterior sampling for reinforcement learning: worst-case regret bounds
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter. Our main result is a high probability regret upper bound of Õ(D √ SAT ) for any communicating MDP with S states, A actions and diameter D, when T ≥ SA. Here, reg...
متن کاملOptimistic posterior sampling for reinforcement learning: worst-case regret bounds
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter. Our main result is a high probability regret upper bound of Õ(D √ SAT ) for any communicating MDP with S states, A actions and diameter D, when T ≥ SA. Here, reg...
متن کاملNear-optimal Reinforcement Learning in Factored MDPs
Any learning algorithm over Markov decision processes (MDPs) will have worst-case regret Ω( √ SAT ) where T is the elapsed time and S and A are the cardinalities of the state and action spaces. In many settings of interest S and A may be so huge that it is impossible to guarantee good performance for an arbitrary MDP on any practical timeframe T . We show that, if we know the true system can be...
متن کاملUpper and Lower Bounds on the Time Complexity of Infinite-Domain CSPs
The constraint satisfaction problem (CSP) is a widely studied problem with numerous applications in computer science. For infinitedomain CSPs, there are many results separating tractable and NP-hard cases while upper bounds on the time complexity of hard cases are virtually unexplored. Hence, we initiate a study of the worst-case time complexity of such CSPs. We analyse backtracking algorithms ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017